在线知识蒸馏会在所有学生模型之间进行知识转移,以减轻对预培训模型的依赖。但是,现有的在线方法在很大程度上依赖于预测分布并忽略了代表性知识的进一步探索。在本文中,我们提出了一种用于在线知识蒸馏的新颖的多尺度功能提取和融合方法(MFEF),其中包括三个关键组成部分:多尺度功能提取,双重注意和功能融合,以生成更有信息的特征图,以用于蒸馏。提出了在通道维度中的多尺度提取利用分界线和catenate,以提高特征图的多尺度表示能力。为了获得更准确的信息,我们设计了双重注意,以适应重要的渠道和空间区域。此外,我们通过功能融合来汇总并融合了以前的处理功能地图,以帮助培训学生模型。关于CIF AR-10,CIF AR-100和Cinic-10的广泛实验表明,MFEF转移了更有益的代表性知识,以蒸馏和胜过各种网络体系结构之间的替代方法
translated by 谷歌翻译
过滤器修剪方法通过去除选定的过滤器来引入结构稀疏性,因此对于降低复杂性特别有效。先前的作品从验证较小规范的过滤器的角度从经验修剪网络中造成了较小的最终结果贡献。但是,此类标准已被证明对过滤器的分布敏感,并且由于修剪后的容量差距是固定的,因此准确性可能很难恢复。在本文中,我们提出了一种称为渐近软簇修剪(ASCP)的新型过滤器修剪方法,以根据过滤器的相似性来识别网络的冗余。首先通过聚类来区分来自参数过度的网络的每个过滤器,然后重建以手动将冗余引入其中。提出了一些聚类指南,以更好地保留特征提取能力。重建后,允许更新过滤器,以消除错误选择的效果。此外,还采用了各种修剪率的衰减策略来稳定修剪过程并改善最终性能。通过逐渐在每个群集中生成更相同的过滤器,ASCP可以通过通道添加操作将其删除,几乎没有准确性下降。 CIFAR-10和Imagenet数据集的广泛实验表明,与许多最新算法相比,我们的方法可以取得竞争性结果。
translated by 谷歌翻译
无监督的人重新识别是计算机视觉中的一项具有挑战性且有前途的任务。如今,无监督的人重新识别方法通过使用伪标签培训取得了巨大进步。但是,如何以无监督的方式进行纯化的特征和标签噪声的显式研究。为了净化功能,我们考虑了来自不同本地视图的两种其他功能,以丰富功能表示。所提出的多视图功能仔细地集成到我们的群体对比度学习中,以利用全球功能容易忽略和偏见的更具歧视性线索。为了净化标签噪声,我们建议在离线方案中利用教师模型的知识。具体来说,我们首先从嘈杂的伪标签培训教师模型,然后使用教师模型指导我们的学生模型的学习。在我们的环境中,学生模型可以在教师模型的监督下快速融合,因此,随着教师模型的影响很大,嘈杂标签的干扰。在仔细处理功能学习中的噪音和偏见之后,我们的纯化模块被证明对无监督的人的重新识别非常有效。对三个受欢迎人重新识别数据集进行的广泛实验证明了我们方法的优势。尤其是,我们的方法在具有挑战性的Market-1501基准中,在完全无监督的环境下,在具有挑战性的Market-1501基准中实现了最先进的精度85.8 \%@map和94.5 \% @rank-1。代码将发布。
translated by 谷歌翻译
最近,由于其优越的特征提取性能,深度神经网络(DNN)的应用在诸如计算机视觉(CV)和自然语言处理(NLP)之类的许多领域非常突出。但是,高维参数模型和大规模数学计算限制了执行效率,尤其是用于物联网(IoT)设备。与以前的云/边缘模式不同,为上行链路通信和仅用于设备的设备的巨大压力承担了无法实现的计算强度,我们突出了DNN模型的设备和边缘之间的协作计算,这可以实现良好的平衡通信负载和执行准确性。具体地,提出了一种系统的按需共引起框架来利用多分支结构,其中预先接受的alexNet通过\ emph {早期出口}右尺寸,并在中间DNN层划分。实施整数量化以进一步压缩传输位。结果,我们建立了一个新的深度加强学习(DRL)优化器 - 软演员 - 软件 - 软演员批评者,用于离散(SAC-D),它生成\ emph {退出点},\ emph {partition point},\ emph {压缩位通过软策略迭代。基于延迟和准确性意识奖励设计,这种优化器可以很好地适应动态无线信道等复杂环境和任意CPU处理,并且能够支持5G URLLC。 Raspberry PI 4和PC上的真实世界实验显示了所提出的解决方案的表现。
translated by 谷歌翻译
使机器人能够靠近人类工作,需要一个控制框架,该框架不仅包括用于自主和协调的交互的多感官信息,而且还具有感知的任务计划,以确保适应性和灵活的协作行为。在这项研究中,提出了一种直观的任务堆叠(ISOT)制剂,通过考虑人臂姿势和任务进展来定义机器人的动作。该框架以visuo-tactive信息增强,以有效地了解协作环境,直观地在计划的子任务之间切换。来自深度摄像机的视觉反馈监视并估计物体的姿势和人臂姿势,而触觉数据提供勘探技能以检测和维持所需的触点以避免物体滑动。为了评估由人类和人机合作伙伴执行的所提出的框架,装配和拆卸任务的性能,有效性和可用性,使用不同的评估指标进行考虑和分析,方法适应,掌握校正,任务协调延迟,累积姿势偏差,以及任务重复性。
translated by 谷歌翻译
As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.
translated by 谷歌翻译
Interview has been regarded as one of the most crucial step for recruitment. To fully prepare for the interview with the recruiters, job seekers usually practice with mock interviews between each other. However, such a mock interview with peers is generally far away from the real interview experience: the mock interviewers are not guaranteed to be professional and are not likely to behave like a real interviewer. Due to the rapid growth of online recruitment in recent years, recruiters tend to have online interviews, which makes it possible to collect real interview data from real interviewers. In this paper, we propose a novel application named EZInterviewer, which aims to learn from the online interview data and provides mock interview services to the job seekers. The task is challenging in two ways: (1) the interview data are now available but still of low-resource; (2) to generate meaningful and relevant interview dialogs requires thorough understanding of both resumes and job descriptions. To address the low-resource challenge, EZInterviewer is trained on a very small set of interview dialogs. The key idea is to reduce the number of parameters that rely on interview dialogs by disentangling the knowledge selector and dialog generator so that most parameters can be trained with ungrounded dialogs as well as the resume data that are not low-resource. Evaluation results on a real-world job interview dialog dataset indicate that we achieve promising results to generate mock interviews. With the help of EZInterviewer, we hope to make mock interview practice become easier for job seekers.
translated by 谷歌翻译
Panoptic Part Segmentation (PPS) unifies panoptic segmentation and part segmentation into one task. Previous works utilize separated approaches to handle thing, stuff, and part predictions without shared computation and task association. We aim to unify these tasks at the architectural level, designing the first end-to-end unified framework named Panoptic-PartFormer. Moreover, we find the previous metric PartPQ biases to PQ. To handle both issues, we make the following contributions: Firstly, we design a meta-architecture that decouples part feature and things/stuff feature, respectively. We model things, stuff, and parts as object queries and directly learn to optimize all three forms of prediction as a unified mask prediction and classification problem. We term our model as Panoptic-PartFormer. Secondly, we propose a new metric Part-Whole Quality (PWQ) to better measure such task from both pixel-region and part-whole perspectives. It can also decouple the error for part segmentation and panoptic segmentation. Thirdly, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross attention scheme to further boost part segmentation qualities. We design a new part-whole interaction method using masked cross attention. Finally, the extensive ablation studies and analysis demonstrate the effectiveness of both Panoptic-PartFormer and Panoptic-PartFormer++. Compared with previous Panoptic-PartFormer, our Panoptic-PartFormer++ achieves 2% PartPQ and 3% PWQ improvements on the Cityscapes PPS dataset and 5% PartPQ on the Pascal Context PPS dataset. On both datasets, Panoptic-PartFormer++ achieves new state-of-the-art results with a significant cost drop of 70% on GFlops and 50% on parameters. Our models can serve as a strong baseline and aid future research in PPS. Code will be available.
translated by 谷歌翻译
In contrast to the control-theoretic methods, the lack of stability guarantee remains a significant problem for model-free reinforcement learning (RL) methods. Jointly learning a policy and a Lyapunov function has recently become a promising approach to ensuring the whole system with a stability guarantee. However, the classical Lyapunov constraints researchers introduced cannot stabilize the system during the sampling-based optimization. Therefore, we propose the Adaptive Stability Certification (ASC), making the system reach sampling-based stability. Because the ASC condition can search for the optimal policy heuristically, we design the Adaptive Lyapunov-based Actor-Critic (ALAC) algorithm based on the ASC condition. Meanwhile, our algorithm avoids the optimization problem that a variety of constraints are coupled into the objective in current approaches. When evaluated on ten robotic tasks, our method achieves lower accumulated cost and fewer stability constraint violations than previous studies.
translated by 谷歌翻译
Surgical robot automation has attracted increasing research interest over the past decade, expecting its huge potential to benefit surgeons, nurses and patients. Recently, the learning paradigm of embodied AI has demonstrated promising ability to learn good control policies for various complex tasks, where embodied AI simulators play an essential role to facilitate relevant researchers. However, existing open-sourced simulators for surgical robot are still not sufficiently supporting human interactions through physical input devices, which further limits effective investigations on how human demonstrations would affect policy learning. In this paper, we study human-in-the-loop embodied intelligence with a new interactive simulation platform for surgical robot learning. Specifically, we establish our platform based on our previously released SurRoL simulator with several new features co-developed to allow high-quality human interaction via an input device. With these, we further propose to collect human demonstrations and imitate the action patterns to achieve more effective policy learning. We showcase the improvement of our simulation environment with the designed new features and tasks, and validate state-of-the-art reinforcement learning algorithms using the interactive environment. Promising results are obtained, with which we hope to pave the way for future research on surgical embodied intelligence. Our platform is released and will be continuously updated in the website: https://med-air.github.io/SurRoL/
translated by 谷歌翻译